Hierarchical sets: analyzing pangenome structure through scalable set visualizations

نویسنده

  • Thomas Lin Pedersen
چکیده

Motivation The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis. Results We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. Availability and Implementation The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN ( https://cran.r-project.org/web/packages/hierarchicalSets ). Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PanViz: interactive visualization of the structure of functionally annotated pangenomes

Summary PanViz is a novel, interactive, visualization tool for pangenome analysis. PanViz allows visualization of changes in gene group (groups of similar genes across genomes) classification as different subsets of pangenomes are selected, as well as comparisons of individual genomes to pangenomes with gene ontology based navigation of gene groups. Furthermore it allows for rich and complex vi...

متن کامل

Data Exploration with Paired Hierarchical Visualizations: Initial Designs of Pair Trees

Paired hierarchical visualizations (PairTrees) integrate treemaps, node-link diagrams, choropleth maps and other information visualization techniques to support exploration of hierarchical data sets at multiple levels of abstraction (Kules, Shneiderman et al., in press). Coordinated visualizations are an effective way to support exploratory data analysis of multidimensional data sets. Hierarchi...

متن کامل

GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis.

GET_HOMOLOGUES is an open-source software package that builds on popular orthology-calling approaches making highly customizable and detailed pangenome analyses of microorganisms accessible to nonbioinformaticians. It can cluster homologous gene families using the bidirectional best-hit, COGtriangles, or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domai...

متن کامل

Uncertainty analysis of hierarchical granular structures for multi-granulation typical hesitant fuzzy approximation space

Hierarchical structures and uncertainty measures are two main aspects in granular computing, approximate reasoning and cognitive process. Typical hesitant fuzzy sets, as a prime extension of fuzzy sets, are more flexible to reflect the hesitance and ambiguity in knowledge representation and decision making. In this paper, we mainly investigate the hierarchical structures and uncertainty measure...

متن کامل

Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies.

OBJECTIVE To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology. METHODS We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as wel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2017